Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EN DateTime V2] Added support for cases like "April ninth through 15th" (#2905) #2994

Merged
merged 3 commits into from
Jul 13, 2022

Conversation

aitelint
Copy link
Contributor

Fix to issue #2905.

Modified methods MergeTwoTimePoints in Base DatePeriod Extractor/Parser to support cases where one of the date points is an ordinal number.
Fixed issue in Hindi where "पहले" (first) was wrongly extracted from "से पहले" (before).

Test cases added to EN DateTimeModel.

@@ -346,7 +346,7 @@ private List<ExtractResult> ExtractImpl(string text, DateObject reference)
var simpleCasesResults = Token.MergeAllTokens(tokens, text, ExtractorName);
var ordinalExtractions = config.OrdinalExtractor.Extract(text);

tokens.AddRange(MergeTwoTimePoints(text, reference));
tokens.AddRange(MergeTwoTimePoints(text, new List<ExtractResult>(ordinalExtractions), reference));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to create a new list here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just following the same syntax used for the other methods in the list, will modify to pass directly the original list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the creation of new lists from the other extracting methods

if (er.Count < 2)
{
er = this.config.DateExtractor.Extract(this.config.TokenBeforeDate + text, referenceDate);
if (er.Count >= 2)
er.ForEach(o => o.Start -= this.config.TokenBeforeDate.Length);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit brittle. Can you add a code comment explaining the rationale?
Also, could you run some perf tests? All this new extraction/parsing seems a bit heavy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't some equivalent logic be run in Merge instead of directly in Period?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented new approach by adding written ordinal numbers to the existing regexes, there is no perf impact in this way.

Test results for the mean test case execution time:
Base: (59.6 +/- 3.1)ms
Fix with OrdinalExtractor: (63.9 +/- 3.7)ms
Fix with regexes: (58.5 +/- 3.1)ms

{
var pr1Value = (DateTimeResolutionResult)pr1.Value;
var pr2Value = (DateTimeResolutionResult)pr2.Value;
var timexList1 = pr1.TimexStr.Split(Constants.DateTimexConnector[0]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why timex parsing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the timex of the ordinal number needs to be updated to refer to the same month of the other date.
It is similar to what happens to the year in the method SyncYear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed method

@tellarin tellarin merged commit f4b0e70 into microsoft:master Jul 13, 2022
@aitelint aitelint deleted the #2905 branch July 13, 2022 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants